Overview

Dataset statistics

Number of variables27
Number of observations108785
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory75.2 MiB
Average record size in memory725.3 B

Variable types

NUM16
CAT9
BOOL2

Reproduction

Analysis started2020-05-05 06:15:53.588805
Analysis finished2020-05-05 06:18:58.309917
Versionpandas-profiling v2.6.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
country has a high cardinality: 173 distinct values High cardinality
arrival_date has a high cardinality: 730 distinct values High cardinality
babies is highly skewed (γ1 = 25.60230831) Skewed
previous_cancellations is highly skewed (γ1 = 23.5072585) Skewed
previous_bookings_not_canceled is highly skewed (γ1 = 22.49048645) Skewed
arrival_date only contains datetime values, but is categorical. Consider applying pd.to_datetime()Type
lead_time has 5936 (5.5%) zeros Zeros
stays_in_weekend_nights has 48197 (44.3%) zeros Zeros
stays_in_week_nights has 6971 (6.4%) zeros Zeros
children has 101830 (93.6%) zeros Zeros
babies has 107951 (99.2%) zeros Zeros
previous_cancellations has 102366 (94.1%) zeros Zeros
previous_bookings_not_canceled has 105388 (96.9%) zeros Zeros
booking_changes has 92781 (85.3%) zeros Zeros
agent has 15530 (14.3%) zeros Zeros
company has 102214 (94.0%) zeros Zeros
days_in_waiting_list has 105101 (96.6%) zeros Zeros
adr has 1742 (1.6%) zeros Zeros
required_car_parking_spaces has 101966 (93.7%) zeros Zeros
total_of_special_requests has 65912 (60.6%) zeros Zeros

Variables

id
Real number (ℝ≥0)

UNIFORM
UNIQUE
Distinct count108785
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean54393.0
Minimum1
Maximum108785
Zeros0
Zeros (%)0.0%
Memory size850.0 KiB

Quantile statistics

Minimum1
5-th percentile5440.2
Q127197
median54393
Q381589
95-th percentile103345.8
Maximum108785
Range108784
Interquartile range (IQR)54392

Descriptive statistics

Standard deviation31403.66885
Coefficient of variation (CV)0.5773476156
Kurtosis-1.2
Mean54393
Median Absolute Deviation (MAD)27196.25
Skewness0
Sum5917142505
Variance986190417.5
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.00000e+00 1.08785e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2047 1 < 0.1%
 
21792 1 < 0.1%
 
17698 1 < 0.1%
 
19747 1 < 0.1%
 
29988 1 < 0.1%
 
32037 1 < 0.1%
 
25894 1 < 0.1%
 
27943 1 < 0.1%
 
5416 1 < 0.1%
 
7465 1 < 0.1%
 
Other values (108775) 108775 > 99.9%
 
ValueCountFrequency (%) 
1 1 < 0.1%
 
2 1 < 0.1%
 
3 1 < 0.1%
 
4 1 < 0.1%
 
5 1 < 0.1%
 
ValueCountFrequency (%) 
108785 1 < 0.1%
 
108784 1 < 0.1%
 
108783 1 < 0.1%
 
108782 1 < 0.1%
 
108781 1 < 0.1%
 

hotel
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size850.0 KiB
City Hotel
72364
Resort Hotel
36421
ValueCountFrequency (%) 
City Hotel 72364 66.5%
 
Resort Hotel 36421 33.5%
 

Length

Max length12
Mean length10.66959599
Min length10
ValueCountFrequency (%) 
Lowercase_Letter 8 66.7%
 
Uppercase_Letter 3 25.0%
 
Space_Separator 1 8.3%
 
ValueCountFrequency (%) 
Latin 11 91.7%
 
Common 1 8.3%
 
ValueCountFrequency (%) 
ASCII 12 100.0%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size850.0 KiB
0
68460
1
40325
ValueCountFrequency (%) 
0 68460 62.9%
 
1 40325 37.1%
 

lead_time
Real number (ℝ≥0)

ZEROS
Distinct count464
Unique (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean100.15984740543273
Minimum0
Maximum737
Zeros5936
Zeros (%)5.5%
Memory size850.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q117
median65
Q3152
95-th percentile316
Maximum737
Range737
Interquartile range (IQR)135

Descriptive statistics

Standard deviation104.95313
Coefficient of variation (CV)1.047856329
Kurtosis1.866543146
Mean100.1598474
Median Absolute Deviation (MAD)82.6484831
Skewness1.396455732
Sum10895889
Variance11015.1595
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000e+00 5.000e-01 1.500e+00 2.500e+00 4.500e+00 ... 6.065e+02 6.240e+02 6.275e+02 6.690e+02 7.370e+02], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 5936 5.5%
 
1 3296 3.0%
 
2 1939 1.8%
 
3 1720 1.6%
 
4 1625 1.5%
 
5 1479 1.4%
 
6 1364 1.3%
 
7 1271 1.2%
 
8 1075 1.0%
 
12 1016 0.9%
 
Other values (454) 88064 81.0%
 
ValueCountFrequency (%) 
0 5936 5.5%
 
1 3296 3.0%
 
2 1939 1.8%
 
3 1720 1.6%
 
4 1625 1.5%
 
ValueCountFrequency (%) 
737 1 < 0.1%
 
709 1 < 0.1%
 
629 17 < 0.1%
 
626 30 < 0.1%
 
622 17 < 0.1%
 

stays_in_weekend_nights
Real number (ℝ≥0)

ZEROS
Distinct count17
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.9094452360159948
Minimum0
Maximum19
Zeros48197
Zeros (%)44.3%
Memory size850.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile2
Maximum19
Range19
Interquartile range (IQR)2

Descriptive statistics

Standard deviation0.9925027134
Coefficient of variation (CV)1.091327629
Kurtosis7.060820167
Mean0.909445236
Median Absolute Deviation (MAD)0.8058561758
Skewness1.398232453
Sum98934
Variance0.9850616361
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 6.5 7.5 8.5 12.5 19. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 48197 44.3%
 
2 29599 27.2%
 
1 28012 25.7%
 
4 1605 1.5%
 
3 1047 1.0%
 
6 149 0.1%
 
5 72 0.1%
 
8 57 0.1%
 
7 19 < 0.1%
 
9 9 < 0.1%
 
Other values (7) 19 < 0.1%
 
ValueCountFrequency (%) 
0 48197 44.3%
 
1 28012 25.7%
 
2 29599 27.2%
 
3 1047 1.0%
 
4 1605 1.5%
 
ValueCountFrequency (%) 
19 1 < 0.1%
 
18 1 < 0.1%
 
16 2 < 0.1%
 
14 1 < 0.1%
 
13 2 < 0.1%
 

stays_in_week_nights
Real number (ℝ≥0)

ZEROS
Distinct count33
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.4649354230822262
Minimum0
Maximum50
Zeros6971
Zeros (%)6.4%
Memory size850.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q33
95-th percentile5
Maximum50
Range50
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.887033616
Coefficient of variation (CV)0.7655509343
Kurtosis24.67681927
Mean2.464935423
Median Absolute Deviation (MAD)1.337468635
Skewness2.933117079
Sum268148
Variance3.560895869
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 18.5 20.5 21.5 25.5 50. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2 31367 28.8%
 
1 27975 25.7%
 
3 20341 18.7%
 
5 9503 8.7%
 
4 8457 7.8%
 
0 6971 6.4%
 
6 1270 1.2%
 
10 921 0.8%
 
7 891 0.8%
 
8 526 0.5%
 
Other values (23) 563 0.5%
 
ValueCountFrequency (%) 
0 6971 6.4%
 
1 27975 25.7%
 
2 31367 28.8%
 
3 20341 18.7%
 
4 8457 7.8%
 
ValueCountFrequency (%) 
50 1 < 0.1%
 
42 1 < 0.1%
 
40 2 < 0.1%
 
34 1 < 0.1%
 
33 1 < 0.1%
 

adults
Real number (ℝ≥0)

Distinct count14
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.8452635933262858
Minimum0
Maximum55
Zeros205
Zeros (%)0.2%
Memory size850.0 KiB

Quantile statistics

Minimum0
5-th percentile1
Q12
median2
Q32
95-th percentile2
Maximum55
Range55
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.5824939318
Coefficient of variation (CV)0.3156697688
Kurtosis1452.599073
Mean1.845263593
Median Absolute Deviation (MAD)0.3483700911
Skewness19.83475462
Sum200737
Variance0.3392991806
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 4.5 55. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2 81464 74.9%
 
1 21970 20.2%
 
3 5076 4.7%
 
0 205 0.2%
 
4 54 < 0.1%
 
26 5 < 0.1%
 
27 2 < 0.1%
 
20 2 < 0.1%
 
5 2 < 0.1%
 
55 1 < 0.1%
 
Other values (4) 4 < 0.1%
 
ValueCountFrequency (%) 
0 205 0.2%
 
1 21970 20.2%
 
2 81464 74.9%
 
3 5076 4.7%
 
4 54 < 0.1%
 
ValueCountFrequency (%) 
55 1 < 0.1%
 
50 1 < 0.1%
 
40 1 < 0.1%
 
27 2 < 0.1%
 
26 5 < 0.1%
 

children
Real number (ℝ≥0)

ZEROS
Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.09237486785862022
Minimum0.0
Maximum10.0
Zeros101830
Zeros (%)93.6%
Memory size850.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum10
Range10
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.3773091009
Coefficient of variation (CV)4.084542794
Kurtosis22.19372655
Mean0.09237486786
Median Absolute Deviation (MAD)0.1729380483
Skewness4.416691082
Sum10049
Variance0.1423621576
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 1.5 2.5 6.5 10. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 101830 93.6%
 
1 3922 3.6%
 
2 2979 2.7%
 
3 53 < 0.1%
 
10 1 < 0.1%
 
ValueCountFrequency (%) 
0 101830 93.6%
 
1 3922 3.6%
 
2 2979 2.7%
 
3 53 < 0.1%
 
10 1 < 0.1%
 
ValueCountFrequency (%) 
10 1 < 0.1%
 
3 53 < 0.1%
 
2 2979 2.7%
 
1 3922 3.6%
 
0 101830 93.6%
 

babies
Real number (ℝ≥0)

SKEWED
ZEROS
Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.007951463896676931
Minimum0
Maximum10
Zeros107951
Zeros (%)99.2%
Memory size850.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum10
Range10
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.09815748067
Coefficient of variation (CV)12.34457981
Kurtosis1731.883529
Mean0.007951463897
Median Absolute Deviation (MAD)0.01578100803
Skewness25.60230831
Sum865
Variance0.009634891011
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 5.5 10. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 107951 99.2%
 
1 818 0.8%
 
2 14 < 0.1%
 
10 1 < 0.1%
 
9 1 < 0.1%
 
ValueCountFrequency (%) 
0 107951 99.2%
 
1 818 0.8%
 
2 14 < 0.1%
 
9 1 < 0.1%
 
10 1 < 0.1%
 
ValueCountFrequency (%) 
10 1 < 0.1%
 
9 1 < 0.1%
 
2 14 < 0.1%
 
1 818 0.8%
 
0 107951 99.2%
 

meal
Categorical

Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size850.0 KiB
BB
85099
HB
 
12705
SC
 
10238
FB
 
743
ValueCountFrequency (%) 
BB 85099 78.2%
 
HB 12705 11.7%
 
SC 10238 9.4%
 
FB 743 0.7%
 

Length

Max length2
Mean length2
Min length2
ValueCountFrequency (%) 
Uppercase_Letter 5 100.0%
 
ValueCountFrequency (%) 
Latin 5 100.0%
 
ValueCountFrequency (%) 
ASCII 5 100.0%
 

country
Categorical

HIGH CARDINALITY
Distinct count173
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size850.0 KiB
PRT
45865
GBR
10764
FRA
9498
ESP
 
7614
DEU
 
6387
Other values (168)
28657
ValueCountFrequency (%) 
PRT 45865 42.2%
 
GBR 10764 9.9%
 
FRA 9498 8.7%
 
ESP 7614 7.0%
 
DEU 6387 5.9%
 
ITA 3407 3.1%
 
IRL 3053 2.8%
 
BEL 1992 1.8%
 
BRA 1989 1.8%
 
NLD 1910 1.8%
 
Other values (163) 16306 15.0%
 

Length

Max length7
Mean length3.008080158
Min length2
ValueCountFrequency (%) 
Uppercase_Letter 26 86.7%
 
Lowercase_Letter 4 13.3%
 
ValueCountFrequency (%) 
Latin 30 100.0%
 
ValueCountFrequency (%) 
ASCII 30 100.0%
 

market_segment
Categorical

Distinct count8
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size850.0 KiB
Online TA
49545
Offline TA/TO
22682
Groups
19354
Direct
11196
Corporate
 
5079
Other values (3)
 
929
ValueCountFrequency (%) 
Online TA 49545 45.5%
 
Offline TA/TO 22682 20.9%
 
Groups 19354 17.8%
 
Direct 11196 10.3%
 
Corporate 5079 4.7%
 
Complementary 697 0.6%
 
Aviation 230 0.2%
 
Undefined 2 < 0.1%
 

Length

Max length13
Mean length9.015038838
Min length6
ValueCountFrequency (%) 
Lowercase_Letter 17 65.4%
 
Uppercase_Letter 7 26.9%
 
Other_Punctuation 1 3.8%
 
Space_Separator 1 3.8%
 
ValueCountFrequency (%) 
Latin 24 92.3%
 
Common 2 7.7%
 
ValueCountFrequency (%) 
ASCII 26 100.0%
 
Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size850.0 KiB
TA/TO
88940
Direct
 
13203
Corporate
 
6449
GDS
 
188
Undefined
 
5
ValueCountFrequency (%) 
TA/TO 88940 81.8%
 
Direct 13203 12.1%
 
Corporate 6449 5.9%
 
GDS 188 0.2%
 
Undefined 5 < 0.1%
 

Length

Max length9
Mean length5.355223606
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 11 55.0%
 
Uppercase_Letter 8 40.0%
 
Other_Punctuation 1 5.0%
 
ValueCountFrequency (%) 
Latin 19 95.0%
 
Common 1 5.0%
 
ValueCountFrequency (%) 
ASCII 20 100.0%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size850.0 KiB
0
105245
1
 
3540
ValueCountFrequency (%) 
0 105245 96.7%
 
1 3540 3.3%
 

previous_cancellations
Real number (ℝ≥0)

SKEWED
ZEROS
Distinct count15
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.09440639794089259
Minimum0
Maximum26
Zeros102366
Zeros (%)94.1%
Memory size850.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum26
Range26
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.8818788936
Coefficient of variation (CV)9.341304327
Kurtosis620.470582
Mean0.09440639794
Median Absolute Deviation (MAD)0.177671652
Skewness23.5072585
Sum10270
Variance0.7777103829
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 5.5 20. 22.5 25.5 26. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 102366 94.1%
 
1 6008 5.5%
 
2 110 0.1%
 
3 62 0.1%
 
24 48 < 0.1%
 
11 35 < 0.1%
 
4 28 < 0.1%
 
26 26 < 0.1%
 
25 25 < 0.1%
 
19 19 < 0.1%
 
Other values (5) 58 0.1%
 
ValueCountFrequency (%) 
0 102366 94.1%
 
1 6008 5.5%
 
2 110 0.1%
 
3 62 0.1%
 
4 28 < 0.1%
 
ValueCountFrequency (%) 
26 26 < 0.1%
 
25 25 < 0.1%
 
24 48 < 0.1%
 
21 1 < 0.1%
 
19 19 < 0.1%
 

previous_bookings_not_canceled
Real number (ℝ≥0)

SKEWED
ZEROS
Distinct count68
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.13695822034287816
Minimum0
Maximum67
Zeros105388
Zeros (%)96.9%
Memory size850.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum67
Range67
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.448224119
Coefficient of variation (CV)10.57420369
Kurtosis699.3621315
Mean0.1369582203
Median Absolute Deviation (MAD)0.2653629255
Skewness22.49048645
Sum14899
Variance2.097353099
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 7.5 11.5 16.5 28.5 67. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 105388 96.9%
 
1 1452 1.3%
 
2 549 0.5%
 
3 316 0.3%
 
4 223 0.2%
 
5 168 0.2%
 
6 104 0.1%
 
7 80 0.1%
 
8 65 0.1%
 
9 56 0.1%
 
Other values (58) 384 0.4%
 
ValueCountFrequency (%) 
0 105388 96.9%
 
1 1452 1.3%
 
2 549 0.5%
 
3 316 0.3%
 
4 223 0.2%
 
ValueCountFrequency (%) 
67 1 < 0.1%
 
66 1 < 0.1%
 
65 1 < 0.1%
 
64 1 < 0.1%
 
63 1 < 0.1%
 
Distinct count9
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size850.0 KiB
A
79728
D
16771
E
 
5708
F
 
2536
G
 
1796
Other values (4)
 
2246
ValueCountFrequency (%) 
A 79728 73.3%
 
D 16771 15.4%
 
E 5708 5.2%
 
F 2536 2.3%
 
G 1796 1.7%
 
B 1041 1.0%
 
C 689 0.6%
 
H 510 0.5%
 
L 6 < 0.1%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Uppercase_Letter 9 100.0%
 
ValueCountFrequency (%) 
Latin 9 100.0%
 
ValueCountFrequency (%) 
ASCII 9 100.0%
 
Distinct count11
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size850.0 KiB
A
68162
D
22832
E
 
6967
F
 
3354
G
 
2226
Other values (6)
 
5244
ValueCountFrequency (%) 
A 68162 62.7%
 
D 22832 21.0%
 
E 6967 6.4%
 
F 3354 3.1%
 
G 2226 2.0%
 
C 2068 1.9%
 
B 2068 1.9%
 
H 616 0.6%
 
I 327 0.3%
 
K 164 0.2%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Uppercase_Letter 11 100.0%
 
ValueCountFrequency (%) 
Latin 11 100.0%
 
ValueCountFrequency (%) 
ASCII 11 100.0%
 

booking_changes
Real number (ℝ≥0)

ZEROS
Distinct count18
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.21174794319069726
Minimum0
Maximum18
Zeros92781
Zeros (%)85.3%
Memory size850.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum18
Range18
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.6268624708
Coefficient of variation (CV)2.960418228
Kurtosis68.38416753
Mean0.2117479432
Median Absolute Deviation (MAD)0.3611929203
Skewness5.662330433
Sum23035
Variance0.3929565573
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 5.5 6.5 7.5 9.5 18. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 92781 85.3%
 
1 11397 10.5%
 
2 3285 3.0%
 
3 783 0.7%
 
4 317 0.3%
 
5 104 0.1%
 
6 55 0.1%
 
7 24 < 0.1%
 
8 12 < 0.1%
 
9 8 < 0.1%
 
Other values (8) 19 < 0.1%
 
ValueCountFrequency (%) 
0 92781 85.3%
 
1 11397 10.5%
 
2 3285 3.0%
 
3 783 0.7%
 
4 317 0.3%
 
ValueCountFrequency (%) 
18 1 < 0.1%
 
17 2 < 0.1%
 
16 2 < 0.1%
 
15 3 < 0.1%
 
14 3 < 0.1%
 

agent
Real number (ℝ≥0)

ZEROS
Distinct count331
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean73.22889185089856
Minimum0.0
Maximum535.0
Zeros15530
Zeros (%)14.3%
Memory size850.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q16
median9
Q3142
95-th percentile250
Maximum535
Range535
Interquartile range (IQR)136

Descriptive statistics

Standard deviation106.0426083
Coefficient of variation (CV)1.448097952
Kurtosis0.5632151831
Mean73.22889185
Median Absolute Deviation (MAD)89.24863688
Skewness1.327094656
Sum7966205
Variance11245.03477
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000e+00 5.000e-01 1.500e+00 2.500e+00 3.500e+00 ... 4.995e+02 5.050e+02 5.265e+02 5.330e+02 5.350e+02], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
9 28036 25.8%
 
0 15530 14.3%
 
240 12282 11.3%
 
1 7187 6.6%
 
7 3257 3.0%
 
14 3077 2.8%
 
6 2935 2.7%
 
250 2433 2.2%
 
28 1597 1.5%
 
241 1432 1.3%
 
Other values (321) 31019 28.5%
 
ValueCountFrequency (%) 
0 15530 14.3%
 
1 7187 6.6%
 
2 161 0.1%
 
3 1332 1.2%
 
4 47 < 0.1%
 
ValueCountFrequency (%) 
535 3 < 0.1%
 
531 30 < 0.1%
 
527 35 < 0.1%
 
526 5 < 0.1%
 
510 2 < 0.1%
 

company
Real number (ℝ≥0)

ZEROS
Distinct count341
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.422411178011675
Minimum0.0
Maximum539.0
Zeros102214
Zeros (%)94.0%
Memory size850.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile45
Maximum539
Range539
Interquartile range (IQR)0

Descriptive statistics

Standard deviation55.20809832
Coefficient of variation (CV)4.833313865
Kurtosis34.73977106
Mean11.42241118
Median Absolute Deviation (MAD)21.46675838
Skewness5.688207478
Sum1242587
Variance3047.93412
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 3. 8.5 9.5 19. ... 497. 498.5 502.5 526.5 539. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 102214 94.0%
 
40 853 0.8%
 
223 784 0.7%
 
67 267 0.2%
 
45 239 0.2%
 
153 208 0.2%
 
174 147 0.1%
 
281 137 0.1%
 
219 132 0.1%
 
154 127 0.1%
 
Other values (331) 3677 3.4%
 
ValueCountFrequency (%) 
0 102214 94.0%
 
6 1 < 0.1%
 
8 1 < 0.1%
 
9 37 < 0.1%
 
10 1 < 0.1%
 
ValueCountFrequency (%) 
539 1 < 0.1%
 
534 2 < 0.1%
 
530 4 < 0.1%
 
528 1 < 0.1%
 
525 6 < 0.1%
 

days_in_waiting_list
Real number (ℝ≥0)

ZEROS
Distinct count127
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.540249115227283
Minimum0
Maximum391
Zeros105101
Zeros (%)96.6%
Memory size850.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum391
Range391
Interquartile range (IQR)0

Descriptive statistics

Standard deviation18.40151145
Coefficient of variation (CV)7.243979082
Kurtosis170.5080725
Mean2.540249115
Median Absolute Deviation (MAD)4.908808498
Skewness11.41503655
Sum276341
Variance338.6156238
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 2.5 3.5 4.5 ... 219. 223.5 247.5 385. 391. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 105101 96.6%
 
39 227 0.2%
 
58 164 0.2%
 
44 141 0.1%
 
31 127 0.1%
 
35 96 0.1%
 
46 94 0.1%
 
69 89 0.1%
 
63 83 0.1%
 
50 80 0.1%
 
Other values (117) 2583 2.4%
 
ValueCountFrequency (%) 
0 105101 96.6%
 
1 11 < 0.1%
 
2 5 < 0.1%
 
3 59 0.1%
 
4 25 < 0.1%
 
ValueCountFrequency (%) 
391 45 < 0.1%
 
379 15 < 0.1%
 
330 15 < 0.1%
 
259 10 < 0.1%
 
236 35 < 0.1%
 

customer_type
Categorical

Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size850.0 KiB
Transient
80589
Transient-Party
23808
Contract
 
3864
Group
 
524
ValueCountFrequency (%) 
Transient 80589 74.1%
 
Transient-Party 23808 21.9%
 
Contract 3864 3.6%
 
Group 524 0.5%
 

Length

Max length15
Mean length10.25833525
Min length5
ValueCountFrequency (%) 
Lowercase_Letter 12 70.6%
 
Uppercase_Letter 4 23.5%
 
Dash_Punctuation 1 5.9%
 
ValueCountFrequency (%) 
Latin 16 94.1%
 
Common 1 5.9%
 
ValueCountFrequency (%) 
ASCII 17 100.0%
 

adr
Real number (ℝ)

ZEROS
Distinct count7949
Unique (%)7.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean97.08540727122305
Minimum-6.38
Maximum5400.0
Zeros1742
Zeros (%)1.6%
Memory size850.0 KiB

Quantile statistics

Minimum-6.38
5-th percentile37.83
Q166.5
median90
Q3120
95-th percentile178
Maximum5400
Range5406.38
Interquartile range (IQR)53.5

Descriptive statistics

Standard deviation46.91065887
Coefficient of variation (CV)0.4831895976
Kurtosis1502.090978
Mean97.08540727
Median Absolute Deviation (MAD)33.40646664
Skewness14.04442566
Sum10561436.03
Variance2200.609916
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-6.38000e+00 -3.19000e+00 1.30000e-01 5.62500e+00 6.25000e+00 ... 3.05500e+02 3.40355e+02 3.83000e+02 5.09000e+02 5.40000e+03], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
62 3754 3.5%
 
75 2701 2.5%
 
65 2369 2.2%
 
90 2336 2.1%
 
80 1826 1.7%
 
0 1742 1.6%
 
95 1617 1.5%
 
100 1556 1.4%
 
85 1525 1.4%
 
120 1481 1.4%
 
Other values (7939) 87878 80.8%
 
ValueCountFrequency (%) 
-6.38 1 < 0.1%
 
0 1742 1.6%
 
0.26 1 < 0.1%
 
0.5 1 < 0.1%
 
1 14 < 0.1%
 
ValueCountFrequency (%) 
5400 1 < 0.1%
 
510 1 < 0.1%
 
508 1 < 0.1%
 
451.5 1 < 0.1%
 
384 1 < 0.1%
 

required_car_parking_spaces
Real number (ℝ≥0)

ZEROS
Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.06309693431998897
Minimum0
Maximum8
Zeros101966
Zeros (%)93.7%
Memory size850.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum8
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.2465175647
Coefficient of variation (CV)3.906965804
Kurtosis31.21274107
Mean0.06309693432
Median Absolute Deviation (MAD)0.1182836238
Skewness4.181996672
Sum6864
Variance0.06077090972
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 0.5 1.5 2.5 8. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 101966 93.7%
 
1 6789 6.2%
 
2 25 < 0.1%
 
3 3 < 0.1%
 
8 2 < 0.1%
 
ValueCountFrequency (%) 
0 101966 93.7%
 
1 6789 6.2%
 
2 25 < 0.1%
 
3 3 < 0.1%
 
8 2 < 0.1%
 
ValueCountFrequency (%) 
8 2 < 0.1%
 
3 3 < 0.1%
 
2 25 < 0.1%
 
1 6789 6.2%
 
0 101966 93.7%
 

total_of_special_requests
Real number (ℝ≥0)

ZEROS
Distinct count6
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5442386358413385
Minimum0
Maximum5
Zeros65912
Zeros (%)60.6%
Memory size850.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile2
Maximum5
Range5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.7771496217
Coefficient of variation (CV)1.427957463
Kurtosis1.588764744
Mean0.5442386358
Median Absolute Deviation (MAD)0.6595000591
Skewness1.390769741
Sum59205
Variance0.6039615345
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 1.5 2.5 3.5 4.5 5. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 65912 60.6%
 
1 29218 26.9%
 
2 11295 10.4%
 
3 2076 1.9%
 
4 251 0.2%
 
5 33 < 0.1%
 
ValueCountFrequency (%) 
0 65912 60.6%
 
1 29218 26.9%
 
2 11295 10.4%
 
3 2076 1.9%
 
4 251 0.2%
 
ValueCountFrequency (%) 
5 33 < 0.1%
 
4 251 0.2%
 
3 2076 1.9%
 
2 11295 10.4%
 
1 29218 26.9%
 

arrival_date
Categorical

HIGH CARDINALITY
TYPE DATE
Distinct count730
Unique (%)0.7%
Missing0
Missing (%)0.0%
Memory size850.0 KiB
2015-12-05
 
448
2016-11-07
 
365
2015-10-16
 
355
2016-10-13
 
344
2015-09-18
 
340
Other values (725)
106933
ValueCountFrequency (%) 
2015-12-05 448 0.4%
 
2016-11-07 365 0.3%
 
2015-10-16 355 0.3%
 
2016-10-13 344 0.3%
 
2015-09-18 340 0.3%
 
2017-06-08 337 0.3%
 
2017-03-02 335 0.3%
 
2016-10-28 332 0.3%
 
2015-09-17 331 0.3%
 
2017-04-29 330 0.3%
 
Other values (720) 105268 96.8%
 

Length

Max length10
Mean length10
Min length10
ValueCountFrequency (%) 
Decimal_Number 10 90.9%
 
Dash_Punctuation 1 9.1%
 
ValueCountFrequency (%) 
Common 11 100.0%
 
ValueCountFrequency (%) 
ASCII 11 100.0%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

idhotelis_canceledlead_timestays_in_weekend_nightsstays_in_week_nightsadultschildrenbabiesmealcountrymarket_segmentdistribution_channelis_repeated_guestprevious_cancellationsprevious_bookings_not_canceledreserved_room_typeassigned_room_typebooking_changesagentcompanydays_in_waiting_listcustomer_typeadrrequired_car_parking_spacestotal_of_special_requestsarrival_date
01Resort Hotel010120.01BBPRTDirectDirect000AA1250.00.00Transient55.00102017-02-04
12Resort Hotel010210.00BBPRTDirectDirect000AD00.00.00Transient40.00002016-03-04
23City Hotel1560320.00BBPRTGroupsTA/TO000AA029.00.00Transient60.00002017-02-01
34City Hotel02132520.00BBGBROnline TATA/TO000AA09.00.00Transient102.85012016-10-05
45Resort Hotel032841020.00BBGBROffline TA/TOTA/TO000DD0243.00.00Contract121.50022017-06-29
56Resort Hotel0572420.00BBGBROffline TA/TOTA/TO000AA026.00.00Transient39.10002016-04-07
67City Hotel1442320.00BBNLDOnline TATA/TO000DD29.00.00Transient113.52002016-04-10
78Resort Hotel001020.00HBNLDDirectDirect000EE10.00.00Transient84.00002016-03-27
89Resort Hotel041420.00BBESPOffline TA/TOTA/TO000CC0142.00.00Transient102.24102015-09-14
910Resort Hotel0401210.00SCPRTGroupsDirect000AA00.00.00Transient-Party55.00002017-01-16

Last rows

idhotelis_canceledlead_timestays_in_weekend_nightsstays_in_week_nightsadultschildrenbabiesmealcountrymarket_segmentdistribution_channelis_repeated_guestprevious_cancellationsprevious_bookings_not_canceledreserved_room_typeassigned_room_typebooking_changesagentcompanydays_in_waiting_listcustomer_typeadrrequired_car_parking_spacestotal_of_special_requestsarrival_date
108775108776City Hotel1652320.00BBGBROnline TATA/TO000AA09.00.00Transient73.95022016-03-03
108776108777City Hotel0601030.00BBDEUOnline TATA/TO000DE09.00.00Transient137.70022016-07-25
108777108778City Hotel1302120.00HBPRTGroupsTA/TO000AA01.00.00Transient-Party86.00002015-08-08
108778108779City Hotel0951220.00BBDEUOffline TA/TOTA/TO000AA0168.00.00Transient80.75002016-05-06
108779108780Resort Hotel11022520.00BBPRTOnline TATA/TO000AA0240.00.00Transient80.00002016-05-31
108780108781City Hotel1370320.00BBPRTOffline TA/TOTA/TO000AA056.00.00Transient-Party105.00002016-10-13
108781108782City Hotel0131220.00BBFRAOffline TA/TOTA/TO000AA028.00.00Transient82.00002016-01-11
108782108783City Hotel01240320.00BBFRAOnline TATA/TO000AA29.00.00Transient126.00012017-04-18
108783108784Resort Hotel11302520.00BBPRTOnline TATA/TO000AA0240.00.00Transient120.60022015-08-01
108784108785Resort Hotel01742520.00HBGBROffline TA/TOTA/TO000AA0243.00.00Contract58.23002015-09-24